Overview

Dataset statistics

Number of variables23
Number of observations165086
Missing cells649526
Missing cells (%)17.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory69.9 MiB
Average record size in memory443.8 B

Variable types

NUM15
CAT6
DATE2

Reproduction

Analysis started2021-05-20 18:42:36.925810
Analysis finished2021-05-20 18:43:54.684433
Duration1 minute and 17.76 seconds
Versionpandas-profiling v2.7.1
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
RES_ADDR_CITY has a high cardinality: 1067 distinct values High cardinality
original_matur_years is highly correlated with years_to_maturHigh correlation
years_to_matur is highly correlated with original_matur_yearsHigh correlation
age_loan_years is highly correlated with idHigh correlation
id is highly correlated with age_loan_yearsHigh correlation
RES_ADDR_CITY has 58952 (35.7%) missing values Missing
EDUCATION has 105340 (63.8%) missing values Missing
NUMBER_OF_FAMILY_MEMBERS has 46504 (28.2%) missing values Missing
RESIDENTAL_STATUS has 112400 (68.1%) missing values Missing
MARITAL_STATUS has 87541 (53.0%) missing values Missing
FIXED_MONTHLY_EXPENSES has 46504 (28.2%) missing values Missing
Flat_House has 60334 (36.5%) missing values Missing
OPEN_DATE has 38943 (23.6%) missing values Missing
INCOME_houshold has 46504 (28.2%) missing values Missing
dpd has 46504 (28.2%) missing values Missing
prepaid_amount is highly skewed (γ1 = 21.41890118) Skewed
dpd is highly skewed (γ1 = 37.57574973) Skewed
planned_installments has 2742 (1.7%) zeros Zeros
prepaid_amount has 145535 (88.2%) zeros Zeros
NUMBER_OF_FAMILY_MEMBERS has 61233 (37.1%) zeros Zeros
FIXED_MONTHLY_EXPENSES has 83787 (50.8%) zeros Zeros
INCOME_houshold has 75819 (45.9%) zeros Zeros
dpd has 105057 (63.6%) zeros Zeros

Variables

date
Date

Distinct count50
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
Minimum2016-01-31 00:00:00
Maximum2020-02-29 00:00:00
Histogram

date_str
Real number (ℝ≥0)

Distinct count50
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20173678.835915826
Minimum20160131
Maximum20200229
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum20160131
5-th percentile20160331
Q120161130
median20170831
Q320180831
95-th percentile20191031
Maximum20200229
Range40098
Interquartile range (IQR)19701

Descriptive statistics

Standard deviation11124.96164
Coefficient of variation (CV)0.0005514592421
Kurtosis-0.919425036
Mean20173678.84
Median Absolute Deviation (MAD)9900
Skewness0.3972513461
Sum3.330391944e+12
Variance123764771.5
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20161130 4264 2.6%
 
20161031 4264 2.6%
 
20161231 4263 2.6%
 
20170131 4250 2.6%
 
20160930 4234 2.6%
 
20170228 4225 2.6%
 
20170331 4219 2.6%
 
20160831 4219 2.6%
 
20170430 4198 2.5%
 
20160731 4186 2.5%
 
Other values (40) 122764 74.4%
 
ValueCountFrequency (%) 
20160131 3887 2.4%
 
20160229 3949 2.4%
 
20160331 3993 2.4%
 
20160430 4046 2.5%
 
20160531 4102 2.5%
 
ValueCountFrequency (%) 
20200229 1552 0.9%
 
20200131 1636 1.0%
 
20191231 1696 1.0%
 
20191130 1778 1.1%
 
20191031 1864 1.1%
 

id
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count5336
Unique (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2351.100723259392
Minimum1
Maximum5355
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum1
5-th percentile217
Q11108
median2230
Q33551
95-th percentile4742
Maximum5355
Range5354
Interquartile range (IQR)2443

Descriptive statistics

Standard deviation1448.041013
Coefficient of variation (CV)0.6158991822
Kurtosis-1.094258939
Mean2351.100723
Median Absolute Deviation (MAD)1213
Skewness0.1877568478
Sum388133814
Variance2096822.775
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
846 50 < 0.1%
 
1868 50 < 0.1%
 
3407 50 < 0.1%
 
2033 50 < 0.1%
 
2850 50 < 0.1%
 
2126 50 < 0.1%
 
2466 50 < 0.1%
 
1649 50 < 0.1%
 
394 50 < 0.1%
 
2382 50 < 0.1%
 
Other values (5326) 164586 99.7%
 
ValueCountFrequency (%) 
1 41 < 0.1%
 
2 36 < 0.1%
 
3 22 < 0.1%
 
4 39 < 0.1%
 
5 35 < 0.1%
 
ValueCountFrequency (%) 
5355 11 < 0.1%
 
5354 11 < 0.1%
 
5353 11 < 0.1%
 
5352 10 < 0.1%
 
5351 11 < 0.1%
 

years_to_matur
Real number (ℝ)

HIGH CORRELATION
Distinct count3218
Unique (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.61764032080249
Minimum-0.1
Maximum36.97
Zeros5
Zeros (%)< 0.1%
Memory size1.3 MiB

Quantile statistics

Minimum-0.1
5-th percentile1.17
Q13.08
median6.88
Q314.86
95-th percentile25.04
Maximum36.97
Range37.07
Interquartile range (IQR)11.78

Descriptive statistics

Standard deviation7.987546896
Coefficient of variation (CV)0.8305100451
Kurtosis-0.3379568931
Mean9.617640321
Median Absolute Deviation (MAD)4.47
Skewness0.9012519735
Sum1587737.77
Variance63.80090542
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2.25 371 0.2%
 
1.92 365 0.2%
 
2.67 359 0.2%
 
1.67 359 0.2%
 
2.92 340 0.2%
 
2.42 332 0.2%
 
1.75 325 0.2%
 
2.5 324 0.2%
 
2 323 0.2%
 
2.75 319 0.2%
 
Other values (3208) 161669 97.9%
 
ValueCountFrequency (%) 
-0.1 1 < 0.1%
 
-0.03 1 < 0.1%
 
-0.02 1 < 0.1%
 
0 5 < 0.1%
 
0.01 3 < 0.1%
 
ValueCountFrequency (%) 
36.97 1 < 0.1%
 
36.88 1 < 0.1%
 
36.8 1 < 0.1%
 
36.72 1 < 0.1%
 
36.63 1 < 0.1%
 

age_owner_years
Real number (ℝ≥0)

Distinct count4721
Unique (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47.2771499703185
Minimum21.73
Maximum75.98
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum21.73
5-th percentile33.74
Q142.58
median49.34
Q350.76
95-th percentile59.97
Maximum75.98
Range54.25
Interquartile range (IQR)8.18

Descriptive statistics

Standard deviation7.431970013
Coefficient of variation (CV)0.1572000431
Kurtosis0.5404310987
Mean47.27714997
Median Absolute Deviation (MAD)2.12
Skewness-0.187770892
Sum7804795.58
Variance55.23417827
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
50.76 2587 1.6%
 
50.67 2567 1.6%
 
50.59 2549 1.5%
 
50.51 2534 1.5%
 
50.42 2512 1.5%
 
50.34 2489 1.5%
 
50.26 2467 1.5%
 
50.18 2434 1.5%
 
50.09 2392 1.4%
 
50.01 2349 1.4%
 
Other values (4711) 140206 84.9%
 
ValueCountFrequency (%) 
21.73 1 < 0.1%
 
21.81 1 < 0.1%
 
21.89 1 < 0.1%
 
21.98 1 < 0.1%
 
22.06 1 < 0.1%
 
ValueCountFrequency (%) 
75.98 1 < 0.1%
 
75.89 1 < 0.1%
 
75.81 1 < 0.1%
 
75.73 1 < 0.1%
 
75.64 1 < 0.1%
 

original_matur_years
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count1448
Unique (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.336914335558438
Minimum2.46
Maximum41.99
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum2.46
5-th percentile3.76
Q15.04
median10.28
Q320.57
95-th percentile30.9
Maximum41.99
Range39.53
Interquartile range (IQR)15.53

Descriptive statistics

Standard deviation9.334075932
Coefficient of variation (CV)0.6510519428
Kurtosis-0.7759673448
Mean14.33691434
Median Absolute Deviation (MAD)5.29
Skewness0.6538165744
Sum2366823.84
Variance87.12497351
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5.01 5022 3.0%
 
5.04 4474 2.7%
 
10.01 3901 2.4%
 
5.02 3878 2.3%
 
5 3859 2.3%
 
4.99 3721 2.3%
 
4.98 3502 2.1%
 
10.04 3380 2.0%
 
5.03 2889 1.7%
 
9.99 2599 1.6%
 
Other values (1438) 127861 77.5%
 
ValueCountFrequency (%) 
2.46 26 < 0.1%
 
2.72 16 < 0.1%
 
2.9 12 < 0.1%
 
2.96 595 0.4%
 
2.97 371 0.2%
 
ValueCountFrequency (%) 
41.99 50 < 0.1%
 
41.96 25 < 0.1%
 
41.41 38 < 0.1%
 
40.61 45 < 0.1%
 
40.5 29 < 0.1%
 

client_rate
Real number (ℝ≥0)

Distinct count495
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.04750170638333959
Minimum0.0206
Maximum0.098
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum0.0206
5-th percentile0.0263
Q10.0334
median0.043
Q30.0579
95-th percentile0.0829
Maximum0.098
Range0.0774
Interquartile range (IQR)0.0245

Descriptive statistics

Standard deviation0.01767478402
Coefficient of variation (CV)0.3720873494
Kurtosis0.5266817117
Mean0.04750170638
Median Absolute Deviation (MAD)0.0119
Skewness0.9385831883
Sum7841.8667
Variance0.0003123979902
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.0579 12670 7.7%
 
0.068 5443 3.3%
 
0.0529 4487 2.7%
 
0.073 3278 2.0%
 
0.0429 3138 1.9%
 
0.053 3012 1.8%
 
0.0389 2966 1.8%
 
0.098 2823 1.7%
 
0.0279 2474 1.5%
 
0.0479 2269 1.4%
 
Other values (485) 122526 74.2%
 
ValueCountFrequency (%) 
0.0206 100 0.1%
 
0.0207 220 0.1%
 
0.0208 250 0.2%
 
0.021 47 < 0.1%
 
0.0211 93 0.1%
 
ValueCountFrequency (%) 
0.098 2823 1.7%
 
0.0979 1462 0.9%
 
0.0978 82 < 0.1%
 
0.0977 73 < 0.1%
 
0.0974 51 < 0.1%
 

original_volume
Real number (ℝ≥0)

Distinct count2152
Unique (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean172699.54929830512
Minimum570.35
Maximum3060751.59
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum570.35
5-th percentile15140.2
Q155686.9
median122236.38
Q3225029
95-th percentile518500
Maximum3060751.59
Range3060181.24
Interquartile range (IQR)169342.1

Descriptive statistics

Standard deviation178797.8389
Coefficient of variation (CV)1.035311555
Kurtosis26.25599215
Mean172699.5493
Median Absolute Deviation (MAD)76799.12
Skewness3.376015079
Sum2.85102778e+10
Variance3.196866721e+10
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
103700 7327 4.4%
 
207400 4461 2.7%
 
155550 4347 2.6%
 
51850 4019 2.4%
 
82960 3215 1.9%
 
72590 3202 1.9%
 
124440 3138 1.9%
 
62220 2616 1.6%
 
41480 2408 1.5%
 
259250 2329 1.4%
 
Other values (2142) 128024 77.5%
 
ValueCountFrequency (%) 
570.35 2 < 0.1%
 
1037 33 < 0.1%
 
1921.98 6 < 0.1%
 
2696.2 35 < 0.1%
 
3422.1 2 < 0.1%
 
ValueCountFrequency (%) 
3060751.59 37 < 0.1%
 
2074000 50 < 0.1%
 
1742160 35 < 0.1%
 
1659200 34 < 0.1%
 
1555500 53 < 0.1%
 

age_loan_years
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count1578
Unique (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.719292913996342
Minimum0.0
Maximum17.58
Zeros63
Zeros (%)< 0.1%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile0.56
Q12.23
median4.29
Q36.94
95-th percentile10.01
Maximum17.58
Range17.58
Interquartile range (IQR)4.71

Descriptive statistics

Standard deviation3.019929133
Coefficient of variation (CV)0.639911357
Kurtosis-0.4249606393
Mean4.719292914
Median Absolute Deviation (MAD)2.3
Skewness0.5059218032
Sum779089.19
Variance9.119971967
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2.42 345 0.2%
 
2.67 338 0.2%
 
2.59 338 0.2%
 
2.25 327 0.2%
 
1.84 312 0.2%
 
2.84 310 0.2%
 
1.92 309 0.2%
 
2.5 307 0.2%
 
2.35 303 0.2%
 
1.67 302 0.2%
 
Other values (1568) 161895 98.1%
 
ValueCountFrequency (%) 
0 63 < 0.1%
 
0.01 81 < 0.1%
 
0.02 119 0.1%
 
0.03 90 0.1%
 
0.04 105 0.1%
 
ValueCountFrequency (%) 
17.58 1 < 0.1%
 
17.5 1 < 0.1%
 
17.41 1 < 0.1%
 
17.33 1 < 0.1%
 
17.25 1 < 0.1%
 

outstanding_volume
Real number (ℝ≥0)

Distinct count161526
Unique (%)97.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean102313.25092878864
Minimum0.0
Maximum1785948.22
Zeros55
Zeros (%)< 0.1%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile5298.575
Q124665.95
median63312.17
Q3137070.3225
95-th percentile330735.78
Maximum1785948.22
Range1785948.22
Interquartile range (IQR)112404.3725

Descriptive statistics

Standard deviation120477.5616
Coefficient of variation (CV)1.177536248
Kurtosis21.06754504
Mean102313.2509
Median Absolute Deviation (MAD)46492
Skewness3.293259992
Sum1.689048534e+10
Variance1.451484285e+10
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
103700 103 0.1%
 
207400 90 0.1%
 
155550 81 < 0.1%
 
72590 57 < 0.1%
 
0 55 < 0.1%
 
1037 38 < 0.1%
 
186660 38 < 0.1%
 
82960 34 < 0.1%
 
259250 34 < 0.1%
 
165920 34 < 0.1%
 
Other values (161516) 164522 99.7%
 
ValueCountFrequency (%) 
0 55 < 0.1%
 
0.01 2 < 0.1%
 
0.02 2 < 0.1%
 
0.03 2 < 0.1%
 
0.04 1 < 0.1%
 
ValueCountFrequency (%) 
1785948.22 1 < 0.1%
 
1781907.83 1 < 0.1%
 
1777677.25 1 < 0.1%
 
1773611.29 1 < 0.1%
 
1769390.49 1 < 0.1%
 

planned_installments
Real number (ℝ≥0)

ZEROS
Distinct count98231
Unique (%)59.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean836.7942237379303
Minimum0.0
Maximum19384.34
Zeros2742
Zeros (%)1.7%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile108.7425
Q1313.92
median570.55
Q31009.65
95-th percentile2402.365
Maximum19384.34
Range19384.34
Interquartile range (IQR)695.73

Descriptive statistics

Standard deviation980.9166328
Coefficient of variation (CV)1.172231601
Kurtosis47.54744744
Mean836.7942237
Median Absolute Deviation (MAD)307.72
Skewness5.135721265
Sum138143011.2
Variance962197.4405
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 2742 1.7%
 
864.16 474 0.3%
 
1296.25 154 0.1%
 
432.08 80 < 0.1%
 
518.5 75 < 0.1%
 
432.09 74 < 0.1%
 
950.59 58 < 0.1%
 
430.83 52 < 0.1%
 
1728.34 50 < 0.1%
 
549.92 49 < 0.1%
 
Other values (98221) 161278 97.7%
 
ValueCountFrequency (%) 
0 2742 1.7%
 
0.01 11 < 0.1%
 
0.02 1 < 0.1%
 
0.04 3 < 0.1%
 
0.39 1 < 0.1%
 
ValueCountFrequency (%) 
19384.34 1 < 0.1%
 
19360.05 1 < 0.1%
 
18568.69 1 < 0.1%
 
18562.81 1 < 0.1%
 
18318.02 1 < 0.1%
 

prepaid_amount
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count7523
Unique (%)4.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1800.0924453920986
Minimum0.0
Maximum1101604.46
Zeros145535
Zeros (%)88.2%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile5401.6125
Maximum1101604.46
Range1101604.46
Interquartile range (IQR)0

Descriptive statistics

Standard deviation13730.30929
Coefficient of variation (CV)7.627557864
Kurtosis798.9204637
Mean1800.092445
Median Absolute Deviation (MAD)0
Skewness21.41890118
Sum297170061.4
Variance188521393.2
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 145535 88.2%
 
10370 961 0.6%
 
1037 884 0.5%
 
5185 800 0.5%
 
2074 711 0.4%
 
518.5 578 0.4%
 
20740 527 0.3%
 
3111 461 0.3%
 
4148 337 0.2%
 
1555.5 304 0.2%
 
Other values (7513) 13988 8.5%
 
ValueCountFrequency (%) 
0 145535 88.2%
 
0.44 1 < 0.1%
 
1.04 1 < 0.1%
 
2.43 1 < 0.1%
 
3.62 1 < 0.1%
 
ValueCountFrequency (%) 
1101604.46 1 < 0.1%
 
839454.77 1 < 0.1%
 
773640 1 < 0.1%
 
647170.91 1 < 0.1%
 
595463.71 1 < 0.1%
 

type
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
cl_fix
78545
mtg_fix
76755
mtg_grn_fix
 
9786
ValueCountFrequency (%) 
cl_fix 78545 47.6%
 
mtg_fix 76755 46.5%
 
mtg_grn_fix 9786 5.9%
 

Length

Max length11
Mean length6.761330458
Min length6
ValueCountFrequency (%) 
Lowercase_Letter 10 90.9%
 
Connector_Punctuation 1 9.1%
 
ValueCountFrequency (%) 
Latin 10 90.9%
 
Common 1 9.1%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

RES_ADDR_CITY
Categorical

HIGH CARDINALITY
MISSING
Distinct count1067
Unique (%)1.0%
Missing58952
Missing (%)35.7%
Memory size1.3 MiB
WARSZAWA
 
6896
KRAKÓW
 
4043
WROCŁAW
 
2267
KATOWICE
 
2222
TYCHY
 
1731
Other values (1062)
88975
ValueCountFrequency (%) 
WARSZAWA 6896 4.2%
 
KRAKÓW 4043 2.4%
 
WROCŁAW 2267 1.4%
 
KATOWICE 2222 1.3%
 
TYCHY 1731 1.0%
 
ŁÓDŹ 1676 1.0%
 
SZCZECIN 1671 1.0%
 
BIELSKO-BIAŁA 1617 1.0%
 
GLIWICE 1559 0.9%
 
CZĘSTOCHOWA 1538 0.9%
 
Other values (1057) 80914 49.0%
 
(Missing) 58952 35.7%
 

Length

Max length26
Mean length6.477236107
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 33 47.8%
 
Lowercase_Letter 30 43.5%
 
Other_Punctuation 4 5.8%
 
Space_Separator 1 1.4%
 
Dash_Punctuation 1 1.4%
 
ValueCountFrequency (%) 
Latin 63 91.3%
 
Common 6 8.7%
 
ValueCountFrequency (%) 
ASCII 52 75.4%
 
Latin Ext A 15 21.7%
 
Latin 1 Sup 2 2.9%
 

EDUCATION
Categorical

MISSING
Distinct count7
Unique (%)< 0.1%
Missing105340
Missing (%)63.8%
Memory size1.3 MiB
H
29466
S
14894
D
7044
R
 
3990
L
 
2142
Other values (2)
 
2210
ValueCountFrequency (%) 
H 29466 17.8%
 
S 14894 9.0%
 
D 7044 4.3%
 
R 3990 2.4%
 
L 2142 1.3%
 
N 1796 1.1%
 
P 414 0.3%
 
(Missing) 105340 63.8%
 

Length

Max length3
Mean length2.276183323
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 7 77.8%
 
Lowercase_Letter 2 22.2%
 
ValueCountFrequency (%) 
Latin 9 100.0%
 
ValueCountFrequency (%) 
ASCII 9 100.0%
 

NUMBER_OF_FAMILY_MEMBERS
Real number (ℝ≥0)

MISSING
ZEROS
Distinct count9
Unique (%)< 0.1%
Missing46504
Missing (%)28.2%
Infinite0
Infinite (%)0.0%
Mean1.206481590797929
Minimum0.0
Maximum50.0
Zeros61233
Zeros (%)37.1%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q32
95-th percentile4
Maximum50
Range50
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.951938738
Coefficient of variation (CV)1.617876934
Kurtosis271.245782
Mean1.206481591
Median Absolute Deviation (MAD)0
Skewness11.32695359
Sum143067
Variance3.810064835
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 61233 37.1%
 
2 19107 11.6%
 
1 14389 8.7%
 
3 11475 7.0%
 
4 10129 6.1%
 
5 1833 1.1%
 
6 228 0.1%
 
8 105 0.1%
 
50 83 0.1%
 
(Missing) 46504 28.2%
 
ValueCountFrequency (%) 
0 61233 37.1%
 
1 14389 8.7%
 
2 19107 11.6%
 
3 11475 7.0%
 
4 10129 6.1%
 
ValueCountFrequency (%) 
50 83 0.1%
 
8 105 0.1%
 
6 228 0.1%
 
5 1833 1.1%
 
4 10129 6.1%
 

RESIDENTAL_STATUS
Categorical

MISSING
Distinct count9
Unique (%)< 0.1%
Missing112400
Missing (%)68.1%
Memory size1.3 MiB
home owner
16242
owner of another apartment
13761
home owner / owner of cooperative house
9727
live with parents
5258
other
 
2296
Other values (4)
5402
ValueCountFrequency (%) 
home owner 16242 9.8%
 
owner of another apartment 13761 8.3%
 
home owner / owner of cooperative house 9727 5.9%
 
live with parents 5258 3.2%
 
other 2296 1.4%
 
none 1898 1.1%
 
najemca prywatne 1531 0.9%
 
private tenant 1032 0.6%
 
tenant municipal cooperative company 941 0.6%
 
(Missing) 112400 68.1%
 

Length

Max length39
Mean length8.589686588
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 19 90.5%
 
Space_Separator 1 4.8%
 
Other_Punctuation 1 4.8%
 
ValueCountFrequency (%) 
Latin 19 90.5%
 
Common 2 9.5%
 
ValueCountFrequency (%) 
ASCII 21 100.0%
 

MARITAL_STATUS
Categorical

MISSING
Distinct count6
Unique (%)< 0.1%
Missing87541
Missing (%)53.0%
Memory size1.3 MiB
M
57594
S
13470
D
 
5035
W
 
746
I
 
541
ValueCountFrequency (%) 
M 57594 34.9%
 
S 13470 8.2%
 
D 5035 3.0%
 
W 746 0.5%
 
I 541 0.3%
 
P 159 0.1%
 
(Missing) 87541 53.0%
 

Length

Max length3
Mean length2.060550259
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 6 75.0%
 
Lowercase_Letter 2 25.0%
 
ValueCountFrequency (%) 
Latin 8 100.0%
 
ValueCountFrequency (%) 
ASCII 8 100.0%
 

FIXED_MONTHLY_EXPENSES
Real number (ℝ≥0)

MISSING
ZEROS
Distinct count144
Unique (%)0.1%
Missing46504
Missing (%)28.2%
Infinite0
Infinite (%)0.0%
Mean339.5313243999933
Minimum0.0
Maximum11700.0
Zeros83787
Zeros (%)50.8%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3390
95-th percentile1755
Maximum11700
Range11700
Interquartile range (IQR)390

Descriptive statistics

Standard deviation755.4855576
Coefficient of variation (CV)2.225083529
Kurtosis31.71659
Mean339.5313244
Median Absolute Deviation (MAD)0
Skewness4.193772584
Sum40262303.51
Variance570758.4277
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 83787 50.8%
 
650 4575 2.8%
 
1300 3601 2.2%
 
390 2636 1.6%
 
780 2197 1.3%
 
130 1729 1.0%
 
1040 1719 1.0%
 
260 1671 1.0%
 
1950 1423 0.9%
 
2600 1404 0.9%
 
Other values (134) 13840 8.4%
 
(Missing) 46504 28.2%
 
ValueCountFrequency (%) 
0 83787 50.8%
 
13 11 < 0.1%
 
26 44 < 0.1%
 
65 661 0.4%
 
78 91 0.1%
 
ValueCountFrequency (%) 
11700 34 < 0.1%
 
7800 96 0.1%
 
6760 23 < 0.1%
 
6500 124 0.1%
 
5443.88 28 < 0.1%
 

Flat_House
Categorical

MISSING
Distinct count2
Unique (%)< 0.1%
Missing60334
Missing (%)36.5%
Memory size1.3 MiB
H
64228
F
40524
ValueCountFrequency (%) 
H 64228 38.9%
 
F 40524 24.5%
 
(Missing) 60334 36.5%
 

Length

Max length3
Mean length1.730940237
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 2 50.0%
 
Lowercase_Letter 2 50.0%
 
ValueCountFrequency (%) 
Latin 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

OPEN_DATE
Date

MISSING
Distinct count2049
Unique (%)1.6%
Missing38943
Missing (%)23.6%
Memory size1.3 MiB
Minimum1991-08-01 00:00:00
Maximum2021-04-10 00:00:00
Histogram

INCOME_houshold
Real number (ℝ)

MISSING
ZEROS
Distinct count1234
Unique (%)1.0%
Missing46504
Missing (%)28.2%
Infinite0
Infinite (%)0.0%
Mean4805.0740982273865
Minimum-28130.388
Maximum335624.652
Zeros75819
Zeros (%)45.9%
Memory size1.3 MiB

Quantile statistics

Minimum-28130.388
5-th percentile0
Q10
median0
Q34953.384
95-th percentile21909.912
Maximum335624.652
Range363755.04
Interquartile range (IQR)4953.384

Descriptive statistics

Standard deviation13132.45357
Coefficient of variation (CV)2.733038721
Kurtosis154.9152487
Mean4805.074098
Median Absolute Deviation (MAD)0
Skewness8.900384591
Sum569795296.7
Variance172461336.8
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 75819 45.9%
 
3000 231 0.1%
 
7200 201 0.1%
 
6960 176 0.1%
 
3600 165 0.1%
 
2640 141 0.1%
 
8400 133 0.1%
 
2520 126 0.1%
 
3240 112 0.1%
 
4200 111 0.1%
 
Other values (1224) 41367 25.1%
 
(Missing) 46504 28.2%
 
ValueCountFrequency (%) 
-28130.388 41 < 0.1%
 
-3530.28 30 < 0.1%
 
-18.948 37 < 0.1%
 
0 75819 45.9%
 
720 12 < 0.1%
 
ValueCountFrequency (%) 
335624.652 39 < 0.1%
 
139273.416 32 < 0.1%
 
134381.7 66 < 0.1%
 
133655.664 74 < 0.1%
 
108072.012 8 < 0.1%
 

dpd
Real number (ℝ≥0)

MISSING
SKEWED
ZEROS
Distinct count178
Unique (%)0.2%
Missing46504
Missing (%)28.2%
Infinite0
Infinite (%)0.0%
Mean1.269990386399285
Minimum0.0
Maximum997.0
Zeros105057
Zeros (%)63.6%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile3
Maximum997
Range997
Interquartile range (IQR)0

Descriptive statistics

Standard deviation20.53734021
Coefficient of variation (CV)16.17125643
Kurtosis1524.30909
Mean1.269990386
Median Absolute Deviation (MAD)0
Skewness37.57574973
Sum150598
Variance421.7823428
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 105057 63.6%
 
1 5012 3.0%
 
2 1683 1.0%
 
3 1291 0.8%
 
4 807 0.5%
 
5 660 0.4%
 
6 469 0.3%
 
7 323 0.2%
 
8 309 0.2%
 
10 280 0.2%
 
Other values (168) 2691 1.6%
 
(Missing) 46504 28.2%
 
ValueCountFrequency (%) 
0 105057 63.6%
 
1 5012 3.0%
 
2 1683 1.0%
 
3 1291 0.8%
 
4 807 0.5%
 
ValueCountFrequency (%) 
997 2 < 0.1%
 
991 2 < 0.1%
 
990 1 < 0.1%
 
986 1 < 0.1%
 
978 3 < 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

datedate_stridyears_to_maturage_owner_yearsoriginal_matur_yearsclient_rateoriginal_volumeage_loan_yearsoutstanding_volumeplanned_installmentsprepaid_amounttypeRES_ADDR_CITYEDUCATIONNUMBER_OF_FAMILY_MEMBERSRESIDENTAL_STATUSMARITAL_STATUSFIXED_MONTHLY_EXPENSESFlat_HouseOPEN_DATEINCOME_housholddpd
02016-10-3120161031111.4944.2225.010.034251850.013.5333397.66196.040.0mtg_fixJELENIA GÓRAS4.0owner of another apartmentM1040.0F2016-10-310.00.0
12016-11-3020161130111.4144.3025.010.034251850.013.6133201.62202.840.0mtg_fixJELENIA GÓRAS4.0owner of another apartmentM1040.0F2016-10-310.05.0
22016-12-3120161231111.3244.3825.010.034251850.013.6932998.78197.210.0mtg_fixJELENIA GÓRAS4.0owner of another apartmentM1040.0F2016-10-310.00.0
32017-01-3120170131111.2444.4625.010.034251850.013.7732801.57201.430.0mtg_fixJELENIA GÓRAS4.0owner of another apartmentM1040.0F2016-10-310.00.0
42017-02-2820170228111.1644.5525.010.034251850.013.8632600.15198.970.0mtg_fixJELENIA GÓRAS4.0owner of another apartmentM1040.0F2016-10-310.00.0
52017-03-3120170331111.0744.6325.010.034251850.013.9432401.18202.530.0mtg_fixJELENIA GÓRAS4.0owner of another apartmentM1040.0F2016-10-310.00.0
62017-04-3020170430110.9944.7225.010.034251850.014.0232198.65199.620.0mtg_fixJELENIA GÓRAS4.0owner of another apartmentM1040.0F2016-10-310.00.0
72017-05-3120170531110.9044.8025.010.034251850.014.1131999.03200.190.0mtg_fixJELENIA GÓRAS4.0owner of another apartmentM1040.0F2016-10-310.00.0
82017-06-3020170630110.8244.8825.010.034251850.014.1931798.84203.750.0mtg_fixJELENIA GÓRAS4.0owner of another apartmentM1040.0F2016-10-310.00.0
92017-07-3120170731110.7444.9725.010.034251850.014.2831595.09201.390.0mtg_fixJELENIA GÓRAS4.0owner of another apartmentM1040.0F2016-10-310.00.0

Last rows

datedate_stridyears_to_maturage_owner_yearsoriginal_matur_yearsclient_rateoriginal_volumeage_loan_yearsoutstanding_volumeplanned_installmentsprepaid_amounttypeRES_ADDR_CITYEDUCATIONNUMBER_OF_FAMILY_MEMBERSRESIDENTAL_STATUSMARITAL_STATUSFIXED_MONTHLY_EXPENSESFlat_HouseOPEN_DATEINCOME_housholddpd
1650762016-12-312016123146593.1050.595.020.05323442.561.9213481.62324.591158.51cl_fixGLIWICER3.0owner of another apartmentM0.0F2000-12-2314459.0760.0
1650772017-01-312017013146593.0250.675.020.05323442.562.0011998.52299.831408.66cl_fixGLIWICER3.0owner of another apartmentM0.0F2000-12-2314459.0760.0
1650782017-02-282017022846592.9350.765.020.05323442.562.0910290.04261.671451.80cl_fixGLIWICER3.0owner of another apartmentM0.0F2000-12-2314459.0761.0
1650792017-03-312017033146592.8550.845.020.05323442.562.178576.57223.501348.10cl_fixGLIWICER3.0owner of another apartmentM0.0F2000-12-2314459.0760.0
1650802017-04-302017043046592.7750.925.020.05323442.562.257004.97191.071266.18cl_fixGLIWICER3.0owner of another apartmentM0.0F2000-12-2314459.0760.0
1650812017-05-312017053146592.6851.015.020.05323442.562.345547.72154.431451.80cl_fixGLIWICER3.0owner of another apartmentM0.0F2000-12-2314459.0760.0
1650822017-06-302017063046592.6051.095.020.05323442.562.423941.49114.980.00cl_fixGLIWICER3.0owner of another apartmentM0.0F2000-12-2314459.0761.0
1650832017-07-312017073146592.5251.185.020.05323442.562.513826.51115.15725.90cl_fixGLIWICER3.0owner of another apartmentM0.0F2000-12-2314459.0762.0
1650842017-08-312017083146592.4351.265.020.05323442.562.592985.4692.33518.50cl_fixGLIWICER3.0owner of another apartmentM0.0F2000-12-2314459.0760.0
1650852017-09-302017093046592.3551.345.020.05323442.562.672374.6376.982297.65cl_fixGLIWICER3.0owner of another apartmentM0.0F2000-12-2314459.0760.0